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Introduction 

With big data availability and advances in computational power and statistical methods, 
the reliance on advanced assessment techniques such as machine learning (ML) algo- 
rithms will continue to grow. Despite the obvious value of such technologies (e.g. faster 
and more engaging selection processes, potentially better prediction, personalized recom- 
mendations, etc.), there is evidence that they could perpetuate biases and thus may vio- 
late workplace antidiscrimination laws. This white paper is an effort to define the problem 
under the concept of algorithmic justice, as well as to provide practical information and 
advice for the use of ML algorithms in workplace assessment. The white paper proceeds 
in four steps by: 

1. defining the emerging concept and scope of algorithmic justice, 

2. outlining the research guidelines for ML algorithms used in workplace assessment, 

3. explaining why ML algorithms can be biased, and 

4. discussing potential legal threats when using algorithms in workplace assessment. 


The term algorithmic justice was coined by Joy Buolamwini, who also founded the Algorithmic Justice League 
at MIT to move toward equitable and accountable use of artificial intelligence (Al). Specifically, algorithmic 
justice is evolving from practical concerns with the accuracy of ML algorithms to classify individuals from pro- 
tected classes. The widespread use of ML algorithms compounds this concern. ML algorithms are being used in 
various contexts such as medical diagnosis (e.g., using facial cues as predictors of health indicators such as fat 
percentage, body mass index, and blood pressure; Stephen, et al., 2017), law enforcement (e.g., using the Next 
Generation Identification-Interstate Photo System to help solve crimes), and employment. 


Currently, ML algorithms’ key promise for faster and accurate prediction of human characteristics is more 
desired than accomplished. First, a controversial use of ML algorithms is for facial recognition. For example, Bu- 
olamwini and Gebru (2018) found that these algorithms can discriminate based on race and gender. A poten- 
tial reason for these differences in classification accuracy is that when facial images are captured in non-ideal, 
everyday life conditions, algorithms cannot accurately detect the facial features (e.g., jaw drop, blink) that 
human coders can detect (Barrett et al., 2019). Second, ML algorithms have also been credited with predicting 
personality. However, this is an overstatement given overall research findings. For example, Youyou et al.’s 
(2015) study of digital footprints found that when fed with only 300 Facebook likes, Extraversion scores pro- 
duced by modified linear regression related more strongly with one’s self-reported Extraversion than the Extra- 
version scores provided by one’s spouse. At the same time, Nguyen et al. (2014) aimed to predict personality 
based on the automatic extraction of nonverbal cues but were successful only for Extraversion. More recently, 
Escalante et al. (2018) found positive judgement biases toward female subjects on all personality factors ex- 
cept for Agreeableness as well as negative biases toward African American subjects. 


To underscore the potential issues with the widespread use of ML algorithms today, the following table pre- 
sents prominent examples of algorithms gone wrong or right, as well as their practical outcomes and societal 
implications. Thus, it is not surprising that members of the US Congress raised concerns to the Federal Trade 
Commission, Federal Bureau of Investigation, and Equal Employment Opportunity Commission regarding the 
potential harm associated with the application of such technologies . Senators have also proposed a bill on 
algorithmic accountability. 
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Company 


Algorithm 


Outcome 


Social implications 


Amazon* 


District of Columbia 
Public Schools** 


Resumé reviewer: Trained 
the algorithm by observing 
patterns in resumes submit- 
ted to the company over a 
10-year period. Most re- 
sumés were from men. 


The evaluation system IM- 
PACT rates teacher’s perfor- 
mance primarily on class- 
room observations and stu- 
dent test scores. Teacher per- 
formance ratings determine 
compensation and job secu- 
rity. The small number of stu- 
dents per teacher makes esti- 
mating student test scores 
statistically unsound. 


Al taught itself that male 
applicants were prefera- 
ble rewarding resumes 
that included words such 
as executed and cap- 
tured, which are words 
more often used by men, 
and penalized resumes 
that included words such 
as women. 

IMPACT has resulted in 
the firing of many educa- 
tors, placed hundreds 
more on notice, and left 
the rest frustrated and 
scared about their job se- 
curity. 


Could widen the gen- 
der gap in an already 
male dominated in- 
dustry. 


The influence of IM- 
PACT is felt the most 
by teachers in DC’s 
poorest school dis- 
tricts where minority 
teachers tend to re- 
side. It perpetuates 
workforce inequalities 
and exacerbates an 
already alarming 
shortage of teachers 
of color. 


cruiting emails, and hiring re- 
sults to identify language that 
may deter applicants of mi- 
nority from applying. 


tions more gender neu- 
tral. Attracts a more di- 
verse group of candi- 
dates. 


Starbucks/Kronos*** | Global labor scheduling sys- Workers were given ir- A single mother trying 
tem uses data on weather regular schedules; some- | to work her way 
patterns, sales, and customer | times having to close the | through college while 
foot traffic to predict labor store at 11pm and return | working at Starbucks 
demand and schedule em- at 4am to open it. They has to put school on 
ployees more efficiently. also did not receive ade- | hold due to irregular 
More employees were sched- | quate notice of their new | work hours and lack 
uled during busy times and schedule (less than a of scheduling notice. 
less were scheduled during week’s notice) 
slow times. 

Textio**** Uses NLP on job listings, re- Helps make job descrip- Increases the diversity 


of the applicant pool 
and subsequent hires. 


Blendoor***** 


Removes data from resumes 
that can result in algorithmic 
bias (e.g., name, address). 


Helps remove conscious 
and unconscious biases 
from hiring. 


Increases the diversity 
of the applicant pool 
and subsequent hires. 


* https://slate.com/business/2018/10/amazon-artificial-intelligence-hiring-discrimination-women.html 
** https://www.air.org/edsector-archives/publications/inside-impact-d-c-s-model-teacher-evaluation-system 


*** https://searchhrsoftware.techtarget.com/news/4500252451/Kronos-shift-scheduling-software-a-grind-for-Star- 


bucks-worker 


**** https://www.upturn.org/reports/2018/hiring-algorithms/ 


*****hitpos://medium.com/allraise/stephanie-lampkin-founder-ceo-of-blendoor-all-raises-enterprise-saas-womancrush- 


wednesday-5b6990570a28 
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Defining Algorithmic Justice 


Nowadays, All ML algorithms follow predefined steps and aim to automatically detect 
algorithms are no patterns from data. Nowadays, algorithms are no longer used solely for 

mathematical computations but are tasked to solve complex problems 
longer used solely such as predicting job performance. Each ML algorithm follows a different 

P “statistical recipe steps” for summarizing and discovering relationships in 

for mathematical data. In general, algorithms should not evoke fear because they have ex- 
computations but isted for centuries; for example, a knitting pattern is as much an algorithm 

as any of the natural language processing (NLP) ML algorithms. Developing 
are tasked to solve algorithms for the assessment of humans is a ground over which neither 

computer science, math, engineering, nor psychology can solely claim 
complex problems ownership. The need for multidisciplinary collaboration among psycholo- 
such as predicti ng gists, computer scientists, and engineers is also overdue. The real need is 
. for transdisciplinary solutions (Liem et al., 2018) so that algorithms do not 
job performa nce. feed us with unreliable and potentially illegal recommendations. 


Industrial-organizational psychology, however, might be useful for defin- 
ing what is a just or fair ML algorithm for workplace assessment. This is 
because research emanating from I-O psychology relates to similar top- 
ics such as the use of structured assessments for employment decision 
an ar . making. For example, Gilliland (1993) proposed an organizational justice 
ua A i model for selection procedures. This model, as applied to fair ML algo- 
g ra an i J rithms, can be readily summarized in terms of the three types of justice 
ie T i À that I-O psychologists recognize: 
Distributive justice: 
e Does the algorithm provide applicants with an assessment decision (note that this is not necessarily a hir- 
ing decision) commensurate with their knowledge, skills, abilities, and other characteristics (KSAOs)? 
e How actual outcomes from the algorithm applications lead to fair and equitable outcomes (e.g. hiring deci- 
sions) for those assessed? 


d 
a -AST 
Zz 2 


(5) +L )=J154 


Procedural justice: 

e Does the algorithm assess job-related candidate characteristics (e.g., attention to detail) versus random 
data on the candidate scraped from social media (i.e., data that do not give the candidate the opportunity 
to perform well for the purpose of the assessment)? 

e Were these characteristics assessed accurately (i.e., with low measurement error) and thoroughly (i.e., not 
assessing attention to detail with a score from a 1-minute game)? 

e Was the candidate able to object to the algorithm’s results? 

e Was the candidate able to ask questions and prepare their performance to present themselves to the best 
of their abilities? For example, in automatically scored video interviews, was the candidate able to ask a 
question during the interview which allowed for a customized answer that helped the candidate during the 
rest of the interview? 

e Did the algorithm assess candidates consistently? That is, did it use as predictors the same personal charac- 
teristics (e.g., the same personality scores) for every candidate? 
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Interpersonal justice: 

e If the candidate was rejected, did the algorithm user give honest feedback to the candidate? Was the can- 
didate told which personal characteristics and assessed variables were analyzed by the algorithm and how 
they led to the assessment decision? 


If the answer to each of these questions is yes, then the algorithm is more likely to be perceived as a fair/just 
algorithm. 


Research Guidelines for Algorithmic Justice 


This section interprets the Principles for the Validation and Use of Personnel Selection Procedures (Principles; 
SIOP, 2019) and their research guidelines for ML algorithms applied in workplace assessment. Below we dis- 
cuss five primary guidelines as applied to ML algorithms. 


First, ML algorithms used to harvest candidate data from social media for professionals (e.g., LinkedIn) and 
resumés are considered modern selection procedures (p. 4). Just like for established selection methods (e.g., 
interviews, cognitive tests), the algorithm user must provide inferential evidence about the job relatedness of 
the algorithm scores. Noteworthy, it is not the reliability of the ML algorithm that is evaluated per se but its va- 
lidity. A valid ML algorithm is characterized by substantive theory and evidence supporting the inferences and 
interpretations for future job performance derived from the algorithmically produced scores (p. 5). Therefore, 
the algorithm provider should not exclusively present evidence of the accuracy (i.e., reliability) of the algorithm 
and automatically infer that the algorithm is valid for predicting job-related outcomes. 


Second, the Principles require that the provider should specify the variables (called predictors by psychologists 
and features by computer scientists) that the testing procedure will measure (p. 20). Although it might be diffi- 
cult to trace how the predictors and their combinations (like the “black box” neural layers in NLP) relate to the 
outcome in question (i.e., predicted future job performance), the Principles require such analysis be performed 
individually for each predictor (p. 21-22). Specifically, algorithmically produced selection scores need to be 
justified not only methodologically but also conceptually with regard to their linkage to the criterion/outcome 
variable (p. 22). Generally, scoring and transformations performed by the algorithm should be described as fully 
as possible (p. 63). Even if not published due to copyright, the algorithm’s computational model and validation 
should be documented. If the algorithm is contested legally, this document should be available so that an inde- 
pendent and statistically savvy evaluator of the selection procedure can reproduce the algorithm and its scores. 


Third, because ML algorithms combine multiple (sometimes thousands) predictors/features, algorithm devel- 
opers and users need to consider and document how the algorithm combines these predictors (p. 25). Given 
the statistically complicated nature of some ML methods, assigning weights to each predictor as in multiple 


Even if not published due to 
copyright, the algorithm’s 
computational model 
and validation should be 
documented. 
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regression is unrealistic. This, however, does not mean the algorithm providers do not need to explain and 
document how the combination is performed and how types of predictors interrelate (e.g., correlations and 
covariances). For example, within the algorithm’s calculations, how do the features derived from facial rec- 
ognition relate to features derived from the sentiment analysis of the candidates’ text? Furthermore, these 
interrelations should not be only justified from one research sample that is (often artificially) split into training 
and test samples to develop the algorithm, the inter-relationships must be cross validated in another sample 
(p. 26) or at least with multiple randomized training and test samples (Friedler et al., 2019). Finally, all samples 
used to train, test, and cross-validate the algorithm must be described in terms of demographic composition, 
population representation, and potential range restriction (p. 63). 


Fourth, regarding the most used selection method—interviews—ML algorithm users have to be aware that 
selecting some candidates with a traditional human-proctored interview and some candidates with an algo- 
rithm-scored interview is dangerous. Even if the latter is accurate and valid, candidates might be receiving 
different assessments whose scores might not be equivalent as the candidates’ experiences during both inter- 
view types are different and the assessed behaviors are no longer equivalent (p. 48). 


Fifth, regarding cutoff scores, ML algorithm users evaluating candidates against a cutoff score should provide 
evidence if the predictors/features relate linearly to the outcome. That is, if a higher score on each predictor 
relates to higher predicted future job performance. At the same time, it is advisable that users of algorithms 

document their selection goals and circumstances that led to employing an algorithm with a cutoff score ver- 
sus an algorithm that ranks candidates (p. 58). 


Based on the five clusters of the challenges above, we provide a checklist for evaluating the research rigor of 
ML algorithms integrated in assessment tools. HR officers can ask algorithm providers these questions to be 
more confident in the validity of the assessment tool of interest. 


Checklist for Evaluating ML Algorithms 

e Does the ML algorithm’s training data represent individuals from protected classes and do the features 
convey protected data? 

e Did the developers of the ML algorithm make clear that their input training data follows the population 
distribution of future input data? The same question applies to the outcome and measurement error of 
predictors and outcomes. 

e Is the algorithm trained on supervisory ratings of job performance? Algorithms might have learned the bias- 
es of hiring managers through the training data used to calibrate the algorithms. 

e Who designs the ML algorithm matters—what was the composition of the team that created the algo- 


ML algorithm users have to 
be aware that selecting some 
candidates with a traditional 
human-proctored interview and 
some candidates with an algorithm- 
scored interview is dangerous. 
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rithm? Were the computer scientists informed by measurement = 
and legal experts? 

e Do the assessment scores produced by the ML algorithm con- 
form to established psychological theory? If in psychological 
theory the predictors relate to the predicted outcome linearly 
(e.g. conscientiousness predicts job performance), then the ML ` x 
algorithm scores must reproduce this expected relationship. 


e Does the ML algorithm use human judges to define the features 
to be extracted from the raw data? Defining the features is like 
defining the predictors in psychological assessment and is subject 


to proof of job relatedness. Decisions on creating features directly 


affect whether an algorithm might turn out to be fair or not. ce e 
e Does the algorithm use substitute variables or data (called Decisions on creating 
proxies) for variables because the intended variables were una- features directly affect 


vailable? If so, are these proxies evaluated for systematic bias? 
For example, Volpone and colleagues (2015) found that using 


whether an algorithm 


job candidates’ credit scores disproportionately disadvantaged might turn out to be fair 
individuals of color. Instead, more race-neutral variables were 
recommended to be used. or not. 


e Was the ML algorithm continuously improved? For example, in 
a selection setting, if applicant responses are transcribed from 
voice to text and then analyzed using an algorithm, it would be important to evaluate intergroup scores to 
assess whether there are systematic differences due to different accents. 


Discovering Algorithmic Bias 


This section sheds light on why and how algorithms become biased. We have included it with the awareness 
that it might be a little complex depending on the professional backgrounds and prior knowledge of some of 
our readers. Nevertheless, we believe that in is in the interest of these readers to know more on discovering al- 
gorithmic bias even if they do not fully grasp the intricacies. For example, HR practitioners can use information 
in this section to come up with talking points and targeted questions for vendors of ML algorithms. 


ML algorithms try to learn patterns in the data. In other words, they try to approximate a function that defines 
a relationship between input variables (features) and the output label that we want to predict. In this process, 
assumptions are made that lead to algorithmic bias. Assumptions depend on each algorithm, but a few exam- 
ples include when input variables are assumed to be independent of each other, each of them is assumed to 
be identically distributed under unknown probability distribution, a linear classifier assumes that the decision 
boundaries are linear, and so on. However, without assumptions the algorithm would have no better perfor- 
mance than a random guess, a principle of “No Free Lunch Theorem.” 


There could be numerous reasons for having biased data. It could be due to over/undersampling, outdated 
data, a distribution of a feature that is not representative of the population, substitute data, limited features, 
incorrect output label, biases and injustices in the world, and so on. For example, Bolukbasi et al. (2016) 
worked on debiasing word embeddings, which are a representation of each word as a vector in about 300 
dimensions on the textual context in which the word is found. For example, vectors of words like “queen,” 
“orincess,” “female,” and “woman” would be much closer to each other than other words with them in the 
embeddings. These word embeddings were trained on Google News articles which would have represented a 
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woman as one who takes care of the family and a man as the one who earns for the family. The NLP algorithm, 
however, exhibited gender stereotypes like “A man is to computer programmer as woman is to homemaker.” 
ML algorithms using word embeddings are widespread, which also raises serious concerns of amplifying bias. 


Companies and users of ML algorithms can take some precautionary measures so that everyone has an equal 
opportunity and selection is not based on sensitive attributes. One question that frequently arises is why we 
don’t remove the sensitive attributes and then train the ML algorithms. The problem is that biases can creep in 
indirectly through other variables like zip code, which can be indicative of individuals’ race. In feature engineer- 
ing (the process of extracting important features from raw data), it is a very important step to remove highly 
correlated (positive or negative) variables because keeping them would impart the same information to the 
model and hence would contribute to complexity and possible errors in the model. For example, zip code and 
percentage of college graduates are highly correlated, and thus one of these variables should be removed. 


Apart from descriptive analysis, there are freely available tools and services to evaluate the fairness of ML 
algorithms. For example, FairTest and Themis enable developers or auditing entities to determine associations 
between an ML algorithm’s sensitive attributes and the benefit/discrimination the algorithm generates. These 
tools conduct group and individual experiments to determine a link between inputs and outputs. They also 
discover insights on whether the protective attributes helped in any kind of decision making. 


ML algorithms can themselves be used to discover discrimina- 
tion. Such algorithms include, but are not limited to: 
e Classification rule mining: finding association rules be- 


tween protected attributes and the output with high ie O rare 
confidence. For example, if the model, outputs with high nt T Vae Séd ja a 
s “a . . xa’ \ cw ~ 
confidence a rule that “if a person eiam race x, he is not iN oT À jaa 
eligible to move to the next round,” it will indicate that A TERMINATES — ee} —— 
se ; . o we D/VISION THOMAN PENALIZES 
the decision making was biased and discriminatory toward RL) ISK eee ier 


_ TWO CHILDREN 
N coL 


certain group of people. 

e K-nearest neighbor classification: finding similar people = = 
with unprotected attributes and checking if protected 7 Mog VSN ~ 
attributes made the difference in the outcome. eae 

e Bayesian networks: by knowing an event happened, what . 
is the likelihood that one of the known causes was the One question that 


contributing factor; for example, given the grass is wet, ° . 
what is the probability it was due to rain or sprinkler? frequently ah ise sil 


e Probabilistic causation: increasing the effect of outcome why we don’t remove 
cen certain protected attributes everything else the sensitive attributes 
and then train the 
ML algorithms. The 
problem is that biases 


Legal Implication Related to the Use of Algorithms 


Organizations may face legal liability for the selection tests 
and processes they use for employee selection, and the use of 


ML algorithms in hiring poses no exception. In fact, the use of can creep in indirectly 
ML algorithms opens a relatively new door in employment law . 
where legal standards that govern decision processes have through other variables 


been outpaced by technology (Kroll et al., 2016). 
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Specifically, algorithm users 
should be transparent in the 
design of the algorithms used 
in employee selection and 
should establish a bona-fide 
auditing process to monitor 
the performance of algorithms 
over time. 


There are two primary types of discrimination that may arise when ML algorithms are used in the employ- 
ment context. Disparate treatment discrimination is a form of intentional discrimination. Some forms of dis- 
parate treatment discrimination are straightforward in the context of ML algorithms. For instance, disparate 
treatment discrimination can occur when a protected class characteristic—such as race, sex, religion, national 
origin, disability or age—is used as one of the input criteria in the algorithm. This is akin to a human decision 
maker considering an applicant’s protected class characteristic when making a selection decision. However, 
other forms of disparate treatment discrimination are more subtle. Such cases arise when an ML algorithm 
includes an input that is a proxy for protected class membership, which effectively results in the application of 
different decision rules to different individuals (Kroll et al., 2016). 


Disparate impact discrimination occurs when a neutral employment practice disadvantages one or more 
groups based on protected class membership. Disparate impact could arise in the context of ML algorithms 

if one protected class subgroup (e.g., men) consistently outperforms another protected class subgroup (e.g., 
women) on a given scoring algorithm. Plaintiff applicants demonstrate disparate impact discrimination by pre- 
senting statistical evidence to indicate that the resulting performance differences are most likely the result of 
the scoring algorithm itself and not a result of mere chance or coincidence. 


There are certain precautions that organizations can take to reduce their litigation risk when ML algorithms are 
part of the selection system. Specifically, algorithm users should be transparent in the design of the algorithms 
used in employee selection and should establish a bona-fide auditing process to monitor the performance of 
algorithms over time. 


Transparency in ML algorithms design involves not only the disclosure of their use in decision making but also 
transparency in the execution of algorithms. Transparency holds organizations more accountable for the way 
an ML algorithm performs and the types of inputs it uses to produce outputs. If an organization is transparent 
about the types of information used to reach decisions and the way an ML algorithm makes decisions, it fol- 
lows that an organization would specifically focus on ensuring the algorithm is only using job-relevant informa- 
tion to make selection decisions (versus protected class information such as race or sex) and that an algorithm 
is successfully and consistently identifying candidates most likely to be successful on the job (versus rating 
candidates at random or failing to identify successful performers at all). 
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Furthermore, recent developments have shifted transparen- 
cy from something that is recommended to something that 


Most notably, the act is required. Illinois is the first state to regulate the use of ML 


requires that employers algorithms in employment interviews (Burstein & DiPrima, 
2020). Illinois Artificial Intelligence Video Interview Act (“Vid- 


obtain consent from eo Interview Act”; 2020) requires that applicants are notified 
applica nts before Al is that artificial intelligence may be used to analyze the interview 
s and consider their fit for a position. The act also requires that 
used to evaluate their applicants are provided with information before the interview 


explaining how Al works and in general how it evaluates ap- 
plicants. Most notably, the act requires that employers obtain 


interview; Al cannot be 


used for applica nts who consent from applicants before Al is used to evaluate their in- 
do not consent. 


terview; Al cannot be used for applicants who do not consent. 


The General Data Protection Regulation (GDPR), which 
commenced in May 2018 and provides specific data priva- 
cy protections for citizens of the European Union (EU), also 
contains provisions specific to the use of Al in employee 
selection. Article 22 of the GDPR prohibits decisions based 
solely on automated processing that produce legal effects or 
significantly affects an individual. There are some exceptions 
to this, most notably applicant consent. Articles 13—15 of 
the GDPR also state that data subjects must be informed of 
the existence of automated decision making, though there 
is controversy over how specific this transparency must be. 
Regardless, any organization that could potentially have EU 
citizen applicants must be aware of these transparency requirements as applied to the use of Al in employee 
selection. Even though GDPR does not apply to non-EU citizens and Illinois is the first U.S. state to formalize 
transparency in the use of Al in the employment interview process, it is not unlikely that more states will follow 
suit in the future. Organizations are encouraged to proactively increase the transparency of algorithm design 
processes regardless of whether they are compelled by law to do so. 


Even if organizations take transparent steps to reduce the potential for algorithmic bias in the development 
stage, it is essential that organizations also audit ML algorithm performance on a regular basis. One of the 
benefits of Al is that algorithms continuously improve their predictions. However, this also means that an ML 
algorithm that was not displaying evidence of disparate impact at the outset could begin to have disparate 
impact as it uses data inputs differently to produce more accurate outputs. As a result, it is imperative that 
organizations undertake regular reviews of algorithms and make proactive adjustments to avoid the potential 
for disparate impact liability. How regular these reviews should be is not established by guidelines, but we rec- 
ommend that they are as frequent as possible and certainly more than once a year. For example, Textio, one of 
the companies whose algorithms are reviewed in the table above, updates its algorithm constantly to reflect 
the word patterns of current job postings . This enables Textio to recommend writing a job posting without the 
unconscious biases as seen “right now” in job postings. 
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It is not necessary for such oversight to involve the disclosure of source code in its entirety; that is, third parties 
can audit ML algorithms without concerns of revealing protected intellectual property. For instance, an auditor 
can examine changes in algorithmic outputs (e.g., an applicant score) when inputs are changed (e.g., stronger 
or weaker responses to items on an employment test) without full access to the source code. Theoretically, 
stronger responses on an employment test should yield higher applicant scores, and weaker responses on an 
employment test should yield lower applicant scores. This is something that can be tested without the need to 
be privy to everything happening in the code between the input and the output. 


Input—output decisions can also be tested in reverse, such as to determine whether the output (e.g. an appli- 
cant score) would be the same even if one of the inputs that should not affect the output (e.g., gender, race) 
were changed (Kroll et al., 2016). This type of oversight based on partial information occurs regularly within 
the legal system, and ML algorithms—even given their complex nature compared to other selection tools—do 
not need to be an exception. 


Noteworthy, auditing is distinct from canceling or adjusting an applicant’s score reactively to avoid disparate 
impact liability. Such a practice can result in disparate treatment liability, as it involves taking an adverse action 
against the applicants who took and prepared for the test (e.g., U.S. Supreme Court case Ricci v. DeStefano, 
2009). Auditing as a proactive strategy for detecting and responding to biased ML algorithms is exactly the type 
of compliance effort that employment law encourages (Kim, 2017). 


The legal recommendations about algorithm transparency, data protection, and auditing do not guarantee algo- 
rithmic justice because each case of applying ML algorithms should pass the justice test on its own circumstances, 
merits, and drawbacks. However, the recommendations are a baseline that HR practitioners should be seeking to 
establish for a responsible and potentially legally defensible use of ML algorithms in assessment contexts. 


Conclusion 


In conclusion, we reiterate the basic requirements for algorithmic justice. Providers and users of ML algorithms 
need to disclose sufficient information about the algorithms so that the algorithms can be independently 
audited. From both research and workplace law perspectives, a clear and theoretically founded link should be 
established between the outcome (e.g., predicted job performance) and the algorithmic features, and final 
assessment scores derived from them. 


It is attributed to Albert Einstein to have said that everything should be made as simple as possible but not 

simpler. Similarly, an ML algorithm should be made as simple as possible for the understanding of its users. 

Assessed individuals should be able to understand what was measured and, in general terms, how data were 

modelled by the ML algorithm to arrive at its final recommendations and/or decisions. In sum, from an organi- 

zational justice perspective, ML algorithms are just if: 

e MLalgorithms reliably classify individuals commensurate with their assessed characteristics without using 
features containing legally protected information. 

e Assessed individuals are given the right to know how their characteristics are combined and processed in 
the ML algorithm. Voicing concerns is allowed too. 

e The assessment results are communicated in a personalized and caring way to each individual. Al is still not 
evolved enough to deliver this feedback and allow a genuine human—human interaction. 
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